Unless otherwise noted, all R code can be found at the end

Question 1

Part a

Plots of January 2014 electricity demand in Victoria against daily maximum temperature and day of the month are displayed below. The plot of electricity demand versus day of the month is not very insightful other than to display that electricity demand varied throughout the month. Of note, to simplify model interpretation, electricity demand figures were divided by a factor of \(1,000\).

Electricity demand displays a positive relationship to temperature. A regression model, summarized below, verifies the positive relationship suggesting that daily electricity demand increases by approximately \(6,500\) MW (\(6.5\) GW) for every increase in temperature of 1 degree celsius. According to the table documentation, the temperature increases are measured in Melbourne, Australia. This positive relationship is to be expected. As temperatures rise in January, Summer in Victoria, people are more likely to use air conditioning and for longer periods of time.

## Series: Demand 
## Model: TSLM 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -124.052  -13.226    4.465   26.386   36.075 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  42.5831    22.8963   1.860   0.0727 .  
## Temperature   6.4907     0.7903   8.213 3.62e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 33.13 on 30 degrees of freedom
## Multiple R-squared: 0.6922,  Adjusted R-squared: 0.6819
## F-statistic: 67.46 on 1 and 30 DF, p-value: 3.6162e-09

Part b

Residual plots of the model show no significant pattern in residuals over time, and autocorrelation within reasonability. Although there appears to be left skew in the residuals, suggesting a potential tendency of the model to significantly overpredict electricity useage, this appears to be associated with a 25 degree day with lower than usual electricity demand. Given the low number of observations used in building the model, it would be helpful to add more January observations from other years. Doing so would better underscore the existence of potentially higher prediction variability for lower maximum temperatures; this would not be largely unexpected as a max temperature of 25 degrees celsius does not necessarily indicate an overall hot day.

Overall, residual plots suggest adequacy in the model, barring other considerations such as \(CV\), \(AIC_{c}\), and/or \(BIC\). This position is further validated by plots of residuals against the Temperature predictor and the model’s fitted values which display no relationship and relatively constant variance. Nevertheless, upon further consideration, using daily average temperature may have produced stronger results.

Part c

The point forecast of total electricity demand for a day with maximum temperature of \(15^{o} C\) is \(140,000 MW\), while for a day with maximum temperature of \(35^{o} C\) it is \(270,000 MW\).

It is likely that the forecast for the day with \(35^{o} C\) as the maximum temperature is most accurate, as it is likely that the overall daily temperature was relatively high. However, the forecast for the \(15^{o} C\) day should be viewed with some skepticism. Firstly, it is likely that the model has higher variability in predictions for days with lower maximum temperatures as it is more unclear the overall temperature throughout a day. Secondly, a temperature of \(15^{o} C\) falls outside of the temperatures used to train the model, inherently making any forecast on the temperature subject to skepticism.

Part d

Prediction intervals for the forecasts performed in Part c are shown below.

## [1] "Temp15 Demand (GW) Forecast, 80% Pred. Interval -  [94.86, 185.02]80"
## [1] "Temp15 Demand (GW) Forecast, 95% Pred. Interval -  [71, 208.89]95"
## [1] "Temp35 Demand (GW) Forecast, 80% Pred. Interval -  [226.07, 313.45]80"
## [1] "Temp35 Demand (GW) Forecast, 95% Pred. Interval -  [202.94, 336.58]95"

Part e

A full plot of Victoria electricity demand versus Victoria daily maximum temperature is shown below.

The plot appears to display a quadratic relationship between electricity demand and temperature throughout a year. This finding is not unexpected; people demand air conditioning in hotter weather and heat in colder weather. Unsurprisingly, the model built to predict January electricity demand is unreliable throughout the year as it would over-predict electricity demand during times of more moderate temperatures and under-predict electricity demand during times of cold temperatures.

Question 2

Part a

A history of winning Olympic race times by time distance and sex is shown in the plots below. Of note, data from 1916 was excluded as the Olympic games were not held due to World War I. Overall, all the plots seem to demonstrate a downward trend in winning times for each event, especially for the men. Unfortunately, data for women is somewhat inconsistent with some events not being eligible for women racers until later in history; nevertheless, the trend is generally the same for women.

The graphs tend to show a level of convexity; this finding is not wholly surprising as there is theoretically a true minimum time in which a race can be finished. Early reductions in time are generally larger than those seen in more recent history.

Part b

Winning times have been decreasing at an average rate of 2.34 seconds per year across all racing events. A summary of the average rate of reduction by race is shown in the table below.

##    Race Description Avg. Change (sec)
## 1       Race100_men             -0.05
## 2       Race200_men              -0.1
## 3       Race400_men             -0.26
## 4       Race800_men             -0.61
## 5      Race1500_men             -1.26
## 6      Race5000_men             -4.12
## 7     Race10000_men            -10.66
## 8     Race100_women             -0.06
## 9     Race200_women             -0.13
## 10    Race400_women             -0.16
## 11    Race800_women             -0.79
## 12   Race1500_women              0.59
## 13   Race5000_women             -1.21
## 14  Race10000_women            -13.98

Part c

The plots below show residuals against the year for the regression associated with each of the races. Residual results are mixed across races with several displaying the effects of the convexity discussed in Part a. For example, in the 1500m men’s race, the model tends to overpredict in earlier years and overpredict in later years the time for finishing; this residual pattern would be indicative of a convex relationship in the original data.

Given these findings with the residuals, it is reasonable to conclude that the models based solely on trend are unreliable with time. In addition the impossibility of eventually predicting times that are below \(0\) seconds, the predictions going forward will become more inaccurate.

Part d

Forecasts for the 2020 results for each of the races is shown below, along with a prediction interval. The assumptions made in these calculations is that there will continue to be a downward trend in winning finishing times for each of the races. While this assumption may hold, the reduction will be more asymptotic with finishing times reducing by less and less as reduction happens.

## [1] "2020 FORECAST - 100m (men)"
## [1] "Point Estimate -  9.535"
## [1] "80% Prediction Interval - [9.21734294466143, 9.85167138870746]80"
## [1] "95% Prediction Interval - [9.04944634739106, 10.0195679859778]95"
## 
## [1] "2020 FORECAST - 100m (women)"
## [1] "Point Estimate -  10.534"
## [1] "80% Prediction Interval - [10.222243114615, 10.8465753693563]80"
## [1] "95% Prediction Interval - [10.056992348831, 11.0118261351403]95"
## 
## [1] "2020 FORECAST - 200m (men)"
## [1] "Point Estimate -  19.117"
## [1] "80% Prediction Interval - [18.6634266553788, 19.571450659354]80"
## [1] "95% Prediction Interval - [18.4230872193371, 19.8117900953958]95"
## 
## [1] "2020 FORECAST - 200m (women)"
## [1] "Point Estimate -  21.2"
## [1] "80% Prediction Interval - [20.484941464277, 21.914535659906]80"
## [1] "95% Prediction Interval - [20.1065507375542, 22.2929263866288]95"
## 
## [1] "2020 FORECAST - 400m (men)"
## [1] "Point Estimate -  42.042"
## [1] "80% Prediction Interval - [40.4902216789654, 43.5943991188899]80"
## [1] "95% Prediction Interval - [39.6685955413096, 44.4160252565457]95"
## 
## [1] "2020 FORECAST - 400m (women)"
## [1] "Point Estimate -  48.436"
## [1] "80% Prediction Interval - [46.9238950556008, 49.9486324169266]80"
## [1] "95% Prediction Interval - [46.1232954366339, 50.7492320358936]95"
## 
## [1] "2020 FORECAST - 800m (men)"
## [1] "Point Estimate -  99.238"
## [1] "80% Prediction Interval - [94.5950485767846, 103.880348988613]80"
## [1] "95% Prediction Interval - [92.1373780144569, 106.338019550941]95"
## 
## [1] "2020 FORECAST - 800m (women)"
## [1] "Point Estimate -  111.483"
## [1] "80% Prediction Interval - [106.659016016263, 116.307850947454]80"
## [1] "95% Prediction Interval - [104.105123678979, 118.861743284738]95"
## 
## [1] "2020 FORECAST - 1500m (men)"
## [1] "Point Estimate -  206.998"
## [1] "80% Prediction Interval - [195.943819100482, 218.052744851324]80"
## [1] "95% Prediction Interval - [190.091939939598, 223.904624012208]95"
## 
## [1] "2020 FORECAST - 1500m (women)"
## [1] "Point Estimate -  245.457"
## [1] "80% Prediction Interval - [237.826039948383, 253.086990354647]80"
## [1] "95% Prediction Interval - [233.786710373344, 257.126319929686]95"
## 
## [1] "2020 FORECAST - 5000m (men)"
## [1] "Point Estimate -  772.825"
## [1] "80% Prediction Interval - [751.188779487642, 794.460343744204]80"
## [1] "95% Prediction Interval - [739.735488270079, 805.913634961767]95"
## 
## [1] "2020 FORECAST - 5000m (women)"
## [1] "Point Estimate -  892.115"
## [1] "80% Prediction Interval - [841.472870188374, 942.756463144959]80"
## [1] "95% Prediction Interval - [814.664722490849, 969.564610842484]95"
## 
## [1] "2020 FORECAST - 10000m (men)"
## [1] "Point Estimate -  1580.351"
## [1] "80% Prediction Interval - [1540.43228694427, 1620.2701522962]80"
## [1] "95% Prediction Interval - [1519.30048046179, 1641.40195877868]95"
## 
## [1] "2020 FORECAST - 10000m (women)"
## [1] "Point Estimate -  1763.015"
## [1] "80% Prediction Interval - [1733.51292777808, 1792.51778650763]80"
## [1] "95% Prediction Interval - [1717.89528506303, 1808.13542922269]95"

Question 3

The following mathematical steps show that \({\beta}_1\) is the elasticity coefficient of \(log(y) = {\beta}_0 + {\beta}_1log(x) + {\epsilon}\).

\(log(y) = {\beta}_0 + {\beta}_1log(x) + {\epsilon}\) \(y = 10^{({\beta}_0 + {\beta}_1log(x) + {\epsilon})}\) \(y = 10^{({\beta}_0 + {\epsilon})}*10^{log(x)^{{\beta}_1}}\) \(y = x^{{\beta}_1}*10^{({\beta}_0 + {\epsilon})}\)

It’s important to note here that \(x^{{\beta}_1} = y/(10^{({\beta}_0 + {\epsilon})})\). This relationship will be useful in the derivation step.

\(dy/dx = {\beta}_1x^{{\beta}_1-1} = {\beta}_1(x^{{\beta}_1}/x) = {\beta}_1[y/(x*10^{({\beta}_0 + {\epsilon})})]\) \({\beta}_1 = (dy/dx)*(x/y)*10^{({\beta}_0 + {\epsilon})}\)

As the first 2 variables of the final equation show, \({\beta}_1\) is the elasiticity coefficient.

Question 4

Part a

A time plot of the souvenirs dataset is shown below; the multiplicative decomposition of the data using the X-11 method is shown next to it. The seasonally-adjusted data has steadily increased. Seasonally, the data spikes at two points throughout every year corresponding with the Christmas and surfing festival; it appears as though each is similar in popularity as the seasonal spikes are roughly the same in magnitude.

The irregular component appears to display higher levels of variability during earlier years of the wharf shop’s operations; overtime, the irregularity has decreased in magnitude.

Finally, a seasonal plot shows that sales volume tends to increase throughout the year; the plot continues to display an exponentially increasing trend throughout each year, and highlights a spike in tourism during the surfing festival.

Part b

It is appropriate and likely necessary to take logarithms of the Sales data before fitting a model due to the convexity present in the underlying trend of sales volume over time; this is seen in the trend-cycle component of the data above. Convexity implies some form of exponential relationship with time; taking a logarithm of the Sales data will make the trend more linear.

Part c

A regression model has been fit to the logarithm of sales data with a linear trend, seasonal dummies, and a “surfing festival” dummy variable, as the code below shows. The final model was assigned to the variable surfing_tslm. Of note, the seasonal dummy was forced to be based on the month; using the season() option within the TSLM function resulted in the year being used as the seasonality measure.

#QUESTION 4, PART C
date_loop <- as.Date(souvenirs$Month)

surfing_loop <- c()

for(i in 1:length(date_loop)) {
  
  if(year(date_loop[i])>=1988 & month(date_loop[i]) == 3) {
      
    surfing_loop[i] <- 1
    
  } else {
  
    surfing_loop[i] <- 0
    
  }
  
}

souvenirs_model <- 
  souvenirs %>%
  add_column(surf_dummy = surfing_loop) %>%
  add_column(log_sales = log(souvenirs$Sales)) %>%
  add_column(season_month = as.factor(month(souvenirs$Month)))

souvenirs_tslm <-
  souvenirs_model %>%
  model(tslm = TSLM(log_sales~trend()+season_month+surf_dummy))

Part d

Plots of the residual against time and against fitted values are shown below. The plot comparing residuals to fitted values is largely unremarkable and displays mostly consistent variability across fitted values.

On the other hand, the plot of residuals over time does raise concerns. Although the plot itself looks sporadic, there do appear to be periods in a row of consistent over- or under-prediction of sales volumes, especially in the later years. This would suggest a certain degree of autocorrelation between the residuals.

Part e

Boxplots of each month’s residuals are plotted below. Overall, the plots by month show different levels of variability and skew. The winter months, June and July primarily, appear to be most accurately predicted by the model; this finding is expected considering the Winter months are free of other factors that might cause added variability in tourism values. Given the observations surrounding the residual distribution by month, it is likely that the model is inadequate in its ability to accurately predict sales volume, especially at peak tourist points throughout the year, which would be the best use for the model.

Part f

The values of the coefficients produced in souvenirs_tslm are shown below.

The values of the coefficient are representative of the percentage change in overall sales volume (e.g., as part of transforming \(log(y)\) to \(y\)). In summary:

  • Sales volume is trending upwards approximately 2% each month;
  • Sales volume in February increases 25% relative to January;
  • Sales volume in March increases 27% relative to January;
    • The surfing festival increases sales volume by an additional 50% in this month
  • Sales volume in April increases 38% relative to January;
  • Sales volume in May increases 41% relative to January;
  • Sales volume in June increases 45% relative to January;
  • Sales volume in July increases 61% relative to January;
  • Sales volume in August increases 59% relative to January;
  • Sales volume in September increases 67% relative to January;
  • Sales volume in October increases 75% relative to January;
  • Sales volume in November increases 121% relative to January; and
  • Sales volume in December increases 196% relative to January.
## Series: log_sales 
## Model: TSLM 
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.336727 -0.127571  0.002568  0.109106  0.376714 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    7.6196670  0.0742471 102.626  < 2e-16 ***
## trend()        0.0220198  0.0008268  26.634  < 2e-16 ***
## season_month2  0.2514168  0.0956790   2.628 0.010555 *  
## season_month3  0.2660828  0.1934044   1.376 0.173275    
## season_month4  0.3840535  0.0957075   4.013 0.000148 ***
## season_month5  0.4094870  0.0957325   4.277 5.88e-05 ***
## season_month6  0.4488283  0.0957647   4.687 1.33e-05 ***
## season_month7  0.6104545  0.0958039   6.372 1.71e-08 ***
## season_month8  0.5879644  0.0958503   6.134 4.53e-08 ***
## season_month9  0.6693299  0.0959037   6.979 1.36e-09 ***
## season_month10 0.7473919  0.0959643   7.788 4.48e-11 ***
## season_month11 1.2067479  0.0960319  12.566  < 2e-16 ***
## season_month12 1.9622412  0.0961066  20.417  < 2e-16 ***
## surf_dummy     0.5015151  0.1964273   2.553 0.012856 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.179 on 70 degrees of freedom
## Multiple R-squared: 0.9567,  Adjusted R-squared: 0.9487
## F-statistic:   119 on 13 and 70 DF, p-value: < 2.22e-16

Part g

The Ljung-Box test suggests that there is a significant level of autocorrelation between residual values, a result suspected from the plot of residuals over time, and shown more clearly with the ACF plot below. In this way, there is an aspect of the model that has not been properly captured.

## # A tibble: 1 × 3
##   .model lb_stat lb_pvalue
##   <chr>    <dbl>     <dbl>
## 1 tslm      88.8         0

Part h

The monthly sales predictions are shown below, along with prediction intervals. Of note, the predicted point estimates and prediction intervals have been converted to the actual sales predictions, while the visualization of the forecast has been left in its original state of \(log(sales)\).

## [1] "PREDICTION INTERVALS (Adj) -  1994 Jan"
## [1] "Point Est. -  13244.6957316618"
## [1] "80% -  [10310.3999510918, 17014.0795562177]80"
## [1] "95% -  [9030.21351893809, 19426.1148594555]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1994 Feb"
## [1] "Point Est. -  17409.8093475714"
## [1] "80% -  [13552.7535764083, 22364.5667140591]80"
## [1] "95% -  [11869.9816830635, 25535.1246203911]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1994 Mar"
## [1] "Point Est. -  29821.6519127238"
## [1] "80% -  [23184.9417396238, 38358.1262696801]80"
## [1] "95% -  [20292.3566029472, 43825.9064831584]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1994 Apr"
## [1] "Point Est. -  20774.1649093691"
## [1] "80% -  [16171.753070439, 26686.4034963864]80"
## [1] "95% -  [14163.7942169395, 30469.655310688]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1994 May"
## [1] "Point Est. -  21783.7316642341"
## [1] "80% -  [16957.6553841555, 27983.2886368675]80"
## [1] "95% -  [14852.1153035658, 31950.3959887401]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1994 Jun"
## [1] "Point Est. -  23162.2680424653"
## [1] "80% -  [18030.7839553704, 29754.1505793052]80"
## [1] "95% -  [15791.9993213838, 33972.3077460213]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1994 Jul"
## [1] "Point Est. -  27831.5605561626"
## [1] "80% -  [21665.6181773278, 35752.3037954176]80"
## [1] "95% -  [18975.5158955147, 40820.8012502279]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1994 Aug"
## [1] "Point Est. -  27818.4753855121"
## [1] "80% -  [21655.4319604779, 35735.4946410992]80"
## [1] "95% -  [18966.5944459549, 40801.6091122459]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1994 Sep"
## [1] "Point Est. -  30848.4229410591"
## [1] "80% -  [24014.1098615369, 39627.7523271711]80"
## [1] "95% -  [21032.4081069185, 45245.6605593075]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1994 Oct"
## [1] "Point Est. -  34095.5730101108"
## [1] "80% -  [26541.870150745, 43799.0274342129]80"
## [1] "95% -  [23246.3101130986, 50008.2848947609]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1994 Nov"
## [1] "Point Est. -  55176.8420823205"
## [1] "80% -  [42952.6900000427, 70879.9356262507]80"
## [1] "95% -  [37619.487483807, 80928.3726549385]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1994 Dec"
## [1] "Point Est. -  120067.788974434"
## [1] "80% -  [93467.3737057144, 154238.568792994]80"
## [1] "95% -  [81862.0369355898, 176104.510574936]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1995 Jan"
## [1] "Point Est. -  17250.3958575239"
## [1] "80% -  [13389.9315511126, 22223.8744167853]80"
## [1] "95% -  [11709.461199182, 25413.309133478]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1995 Feb"
## [1] "Point Est. -  22675.1983687848"
## [1] "80% -  [17600.7180689423, 29212.7070639816]80"
## [1] "95% -  [15391.7833350609, 33405.1363556116]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1995 Mar"
## [1] "Point Est. -  38840.8545611208"
## [1] "80% -  [30119.8742825499, 50086.9282815089]80"
## [1] "95% -  [26326.435913602, 57304.0721497242]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1995 Apr"
## [1] "Point Est. -  27057.0631108894"
## [1] "80% -  [21001.9657532048, 34857.9115302546]80"
## [1] "95% -  [18366.165813093, 39860.5060869458]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1995 May"
## [1] "Point Est. -  28371.9612798511"
## [1] "80% -  [22022.6029968076, 36551.9092807539]80"
## [1] "95% -  [19258.7104954006, 41797.6160479496]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1995 Jun"
## [1] "Point Est. -  30167.4195304804"
## [1] "80% -  [23416.2558310595, 38865.017861689]80"
## [1] "95% -  [20477.4563661699, 44442.6878443509]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1995 Jul"
## [1] "Point Est. -  36248.883828924"
## [1] "80% -  [28136.7498625735, 46699.8351003801]80"
## [1] "95% -  [24605.5164307033, 53401.9101994223]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1995 Aug"
## [1] "Point Est. -  36231.8412046044"
## [1] "80% -  [28123.5212053893, 46677.8789003172]80"
## [1] "95% -  [24593.9480035292, 53376.8029796311]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1995 Sep"
## [1] "Point Est. -  40178.1602307014"
## [1] "80% -  [31186.6939044231, 51761.9650377549]80"
## [1] "95% -  [27272.6847639683, 59190.5261067897]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1995 Oct"
## [1] "Point Est. -  44407.3720778278"
## [1] "80% -  [34469.4508693004, 57210.5051030847]80"
## [1] "95% -  [30143.4474082614, 65421.0073635496]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1995 Nov"
## [1] "Point Est. -  71864.4193397935"
## [1] "80% -  [55781.8883617439, 92583.7206075579]80"
## [1] "95% -  [48781.1199703009, 105870.770703706]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1995 Dec"
## [1] "Point Est. -  156380.858534587"
## [1] "80% -  [121384.402362515, 201467.177331232]80"
## [1] "95% -  [106150.352167533, 230380.516094927]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1996 Jan"
## [1] "Point Est. -  22467.5721715461"
## [1] "80% -  [17378.8740101132, 29046.2891318436]80"
## [1] "95% -  [15169.764241924, 33276.1795920705]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1996 Feb"
## [1] "Point Est. -  29533.0414480078"
## [1] "80% -  [22844.0795712847, 38180.594427018]80"
## [1] "95% -  [19940.262022642, 43740.675833616]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1996 Mar"
## [1] "Point Est. -  50587.8073908598"
## [1] "80% -  [39105.5421282527, 65441.5235626111]80"
## [1] "95% -  [34123.2958650726, 74996.4559910569]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1996 Apr"
## [1] "Point Est. -  35240.1488762845"
## [1] "80% -  [27258.5797318232, 45558.7967179694]80"
## [1] "95% -  [23793.6144689705, 52193.3350833362]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1996 May"
## [1] "Point Est. -  36952.7223008819"
## [1] "80% -  [28583.2710492429, 47772.8277877512]80"
## [1] "95% -  [24949.9180917993, 54729.7863031829]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1996 Jun"
## [1] "Point Est. -  39291.1954675377"
## [1] "80% -  [30392.1015819344, 50796.0279451659]80"
## [1] "95% -  [26528.8197351714, 58193.2425445042]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1996 Jul"
## [1] "Point Est. -  47211.92605033"
## [1] "80% -  [36518.8596408549, 61036.0231207278]80"
## [1] "95% -  [31876.7769887338, 69924.4456919097]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1996 Aug"
## [1] "Point Est. -  47189.7290877179"
## [1] "80% -  [36501.6900858314, 61007.3266781858]80"
## [1] "95% -  [31861.7899359654, 69891.5703056134]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1996 Sep"
## [1] "Point Est. -  52329.5651971663"
## [1] "80% -  [40477.4006564581, 67652.1552598159]80"
## [1] "95% -  [35332.1293846222, 77504.0576783442]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1996 Oct"
## [1] "Point Est. -  57837.8516845529"
## [1] "80% -  [44738.110987997, 74773.3199638637]80"
## [1] "95% -  [39051.2409447245, 85662.2480248288]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1996 Nov"
## [1] "Point Est. -  93598.9551439093"
## [1] "80% -  [72399.6538880286, 121005.611678486]80"
## [1] "95% -  [63196.5960532982, 138627.156384229]95"
## 
## [1] "PREDICTION INTERVALS (Adj) -  1996 Dec"
## [1] "Point Est. -  203676.382524388"
## [1] "80% -  [157545.557824435, 263314.747626524]80"
## [1] "95% -  [137519.206835167, 301660.180806194]95"

Part i

The model could likely be improved by adding a different time element. For example, instead of predicting next month’s sale volumes from this month’s levels, it may be more appropriate to use last year’s sale volumes in the same month. This approach would eliminate, to a degree, the need for dummy variables, as the time element would incorporate those elements directly.

Question 5

Part a

Some of the results of a harmonic regression (K=4) and six others with Fourier terms ranging from 5 to 10 are shown below. Overall, the fitted terms are close in size to the observed values in the training data; this is shown by the scatterplots on the right side of the panel of graphs. Of note, it does not appear as though any of the models are generally accurate at predicting the peaks and valleys of finished barrels.

If these charts served as the only source of model validation, it would be difficult to discern which model is most appropriate.

Part b

As was alluded in Part a, comparing fitted to observed values does not offer enough by itself to validate a model for supplies of US finished motor gasoline product. The graphs below display the \(CV\) and \(AIC_c\) values associated with every Fourier model (K = 1 through K = 26). The red point on each graph signifies the K value at which either the \(AIC_c\) and/or \(CV\) reach a minimum. In the case of US gasoline data, both metrics are minimized when 7 Fourier terms are included in the model.

Part c

The residual plots of a trend model with 7 Fourier terms are shown below. Overall, residuals over time show a largely random pattern and strong normality. Nevertheless, the ACF plot does suggest potentially strong autocorrelation among residuals. The results of a Ljung-Box test align with this visual finding, and suggest statistically significant autocorrelation between residuals.

## # A tibble: 1 × 3
##   .model lb_stat lb_pvalue
##   <chr>    <dbl>     <dbl>
## 1 tslm      34.6  0.000285

Part d

The graph below shows the forecasts generated by the model with 7 Fourier terms (grey dotted line). The light blue and dark blue dotted lines represent the boundaries of the 95% and 80% prediction intervals, respectively. Finally, the black line are the actually observed values for 2005.

Overall, the forecasted values for 2005 are pretty reasonable. Actual observations more often than not fell within the 80% prediction interval, which appears to have an average range of 750,000 Barrels per day. Further, and as the metric above the graph indicates, the 2005 forecast had an average accuracy of 100.5%.

## [1] 1.005491

Question 6

Part a

A plot of Afghanistan’s population overtime is shown below. The effects of the Soviet-Afghan war can be seen at the point the population line dips between the late-1970’s and late-1980’s. Aside from the dip during the latter war, the Afghanistan population has been steadily increasing; it appears as though the rates of increase between 1960 and the late-1970’s differs from the rate of increase from the late-1980’s to the 2010’s.

There does not appear to be strong seasonality or irregularity in the data, aside from the already mentioned Soviet-Afghan war.

Part b

The results of a linear and piecewise model are shown below, along with the code generating the models; each are stored in the afghan_tslm variable.

The adjusted \(R^2\) is strongest in the piecewise model with \(99.85%\) of variability in population explained in the model; the same metric for the linear model is less at \(83.52%\). Nevertheless, caution should be exercised when working with the piecewise model for forecasting. More data would be required to confirm that the knots chosen truly represent unique moments in the Afghan history. Otherwise, a linear model might be more appropriate.

#QUESTION 6, PART B
afghan_tslm <-
  afghan_economy %>%
  model(linear = TSLM(Population~trend()),
        piecewise = TSLM(Population~trend(knots = c(1980, 1989))))

afghan_tslm %>%
  select(linear) %>%
  report()
## Series: Population 
## Model: TSLM 
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7945 -2.5826  0.7448  2.2592  6.0363 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.79890    0.84826   5.657 5.45e-07 ***
## trend()      0.42577    0.02501  17.025  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.188 on 56 degrees of freedom
## Multiple R-squared: 0.8381,  Adjusted R-squared: 0.8352
## F-statistic: 289.9 on 1 and 56 DF, p-value: < 2.22e-16
afghan_tslm %>%
  select(piecewise) %>%
  report()
## Series: Population 
## Model: TSLM 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.57759 -0.17420 -0.01678  0.18723  0.67995 
## 
## Coefficients:
##                                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                           8.697573   0.131122   66.33   <2e-16 ***
## trend(knots = c(1980, 1989))trend     0.224372   0.009623   23.32   <2e-16 ***
## trend(knots = c(1980, 1989))trend_21 -0.456803   0.024498  -18.65   <2e-16 ***
## trend(knots = c(1980, 1989))trend_30  1.082782   0.021418   50.55   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3009 on 54 degrees of freedom
## Multiple R-squared: 0.9986,  Adjusted R-squared: 0.9985
## F-statistic: 1.293e+04 on 3 and 54 DF, p-value: < 2.22e-16

Part c

Forecasts of the Afghan population for 5 years from the end of the data (2018-2022), are shown graphically below. With a large adjusted \(R^2\), it’s no surprise that the piecewise model fits the training data almost perfectly; the linear model does not fit the training data well. Further, the piecewise model gives a forecast in a much narrower range, whereas the linear model suggests a wide range of variability.

With these considerations in mind, it’s all but obvious that the piecewise model is preferable for the data available. However, caution should still be practiced. The addition of data from earlier years might diminish the currently obvious rate in population change. That said, Afghanistan has displayed what appears to be an almost constant rate of change since 1989; it’s reasonable to expect the trend to continue, at least in the near term.

R Code

knitr::opts_chunk$set(echo = TRUE)

library("feasts")
library("seasonal")
library("tsibble")
library("tsibbledata")
library("dplyr")
library("ggplot2")
library("forecast")
library("fable")
library("fpp3")

#QUESTION 1, PART A
jan14_vic_elec <- vic_elec %>%
                    filter(yearmonth(Time) == yearmonth("2014 Jan")) %>%
                    mutate(Demand = Demand/1000) %>%
                    index_by(Date = as.Date(Time)) %>%
                    summarize(Demand = sum(Demand),
                              Temperature = max(Temperature))


#Plot of temperature vs. electricity demand
ggplot(data = jan14_vic_elec,
       aes(x = Temperature,
           y = Demand)) +
  geom_point() +
  labs(x = "Temperature",
       y = "Jan 2014 Electricity Demand",
       title = "Victoria Jan 2014 Electricity Demand",
       subtitle = "Temperature Relationship")

#Plot of date vs. electricity demand
autoplot(jan14_vic_elec, Demand) +
  labs(x = "Date",
       y = "Jan 2014 Electricity Demand",
       title = "Victoria Jan 2014 Electricity Demand",
       subtitle = "Movement over Month")


jan14_vic_elec_tslm <-
  jan14_vic_elec %>%
  model(tslm = TSLM(Demand~Temperature))

report(jan14_vic_elec_tslm)

#QUESTION 1, PART B
jan14_vic_elec_tslm %>%
  gg_tsresiduals()

plot(residuals(jan14_vic_elec_tslm)$".resid"~jan14_vic_elec$Temperature,
     xlab = "Temperature",
     ylab = "TSLM Residuals",
     main = "TSLM Residuals vs. Temp")
abline(h = 0, col = "red", lty = "dashed")

plot(residuals(jan14_vic_elec_tslm)$".resid"~fitted(jan14_vic_elec_tslm)$".fitted",
     xlab = "TSLM Fitted",
     ylab = "TSLM Residuals",
     main = "TSLM Residuals vs. TSLM Fitted")
abline(h = 0, col = "red", lty = "dashed")

#QUESTION 1, PART C
Temp_scenarios <- scenarios(
  
  Temp15 = new_data(jan14_vic_elec, 1) %>%
    mutate(Temperature = 15),
  
  Temp35 = new_data(jan14_vic_elec, 1) %>%
    mutate(Temperature = 35),
  
  names_to = "Scenaro"
  
)

Temp_forecast <- jan14_vic_elec_tslm %>%
                  forecast(new_data = Temp_scenarios)
  
jan14_vic_elec %>%
  autoplot(Demand) +
  autolayer(Temp_forecast) +
  labs(title = "Temperature Forecast",
       subtitle = "15 and 35 Degrees",
       x = "Date",
       y = "Electricity Demand")

#QUESTION 1, PART D
Temp_hilo <- jan14_vic_elec_tslm %>%
              forecast(new_data = Temp_scenarios) %>%
              hilo()

print(paste("Temp15 Demand (GW) Forecast, 80% Pred. Interval - ", round(Temp_hilo$"80%"[1], 2)))

print(paste("Temp15 Demand (GW) Forecast, 95% Pred. Interval - ", round(Temp_hilo$"95%"[1], 2)))

print(paste("Temp35 Demand (GW) Forecast, 80% Pred. Interval - ", round(Temp_hilo$"80%"[2], 2)))

print(paste("Temp35 Demand (GW) Forecast, 95% Pred. Interval - ", round(Temp_hilo$"95%"[2], 2)))

#QUESTION 1, PART E
#Plot of temperature vs. electricity demand
ggplot(data = vic_elec,
       aes(x = Temperature,
           y = Demand)) +
  geom_point() +
  labs(x = "Temperature",
       y = "Electricity Demand (MW)",
       title = "Victoria Electricity Demand",
       subtitle = "Temperature Relationship")

#QUESTION 2, PART A
length_vector <- as.vector(unique(olympic_running$Length))

olympic_running_model <-
  
  olympic_running %>%
  filter(Year != 1916,
         Year != 1940,
         Year != 1944)

for(i in 1:length(length_vector)) {
  
  olympic_loop <- 
    
    olympic_running_model %>%
    filter(Length == length_vector[i]) %>%
    autoplot(Time) +
    labs(x = "Year",
         y = "Time (sec)",
         title = "Winning Running Times (sec)",
         subtitle = paste(length_vector[i], "Meter Race by Sex"))
  
  print(olympic_loop)
  
}

#QUESTION 2, PART B
olympic_running_tslm <-
  olympic_running_model %>%
    model(TSLM(Time~trend()))

olympic_Length <- unique(olympic_running_tslm$Length)
olympic_Sex <- unique(olympic_running_tslm$Sex)
olympic_running_Coeff <- c()
olympic_running_CoeffN <- c()

for(i in 1:length(olympic_Sex)) {
  
  olympic_running_loop1 <-
    olympic_running_tslm %>%
    filter(Sex == olympic_Sex[i])
  
  for(j in 1:length(olympic_Length)) {
    
    olympic_running_loop2 <-
      olympic_running_loop1 %>%
      filter(Length == olympic_Length[j]) %>%
      coefficients()
    
    olympic_running_CoeffN[length(olympic_running_CoeffN)+1] <- 
      paste0("Race", 
             olympic_Length[j], 
             "_", 
             olympic_Sex[i])
    
    olympic_running_Coeff[length(olympic_running_Coeff)+1] <- 
           subset(olympic_running_loop2, 
                  olympic_running_loop2$term=="trend()")$estimate
    
  }
  
}

olympic_running_tslm_CoeffSumm <- data.frame(cbind(olympic_running_CoeffN,
                                                   round(olympic_running_Coeff, 2)))

colnames(olympic_running_tslm_CoeffSumm) <- c("Race Description", 
                                              "Avg. Change (sec)")

print(olympic_running_tslm_CoeffSumm)

#QUESTION 2, PART C
for(i in 1:length(olympic_Sex)) {
  
  olympic_residual_loop1 <- 
    residuals(olympic_running_tslm) %>%
    filter(Sex == olympic_Sex[i])
  
  olympic_running_loop1 <-
    olympic_running_model %>%
    filter(Sex == olympic_Sex[i])
  
  for(j in 1:length(olympic_Length)) {
    
    olympic_residual_loop2 <-
      olympic_residual_loop1 %>%
      filter(Length == olympic_Length[j])
    
    olympic_running_loop2 <-
      olympic_running_loop1 %>%
      filter(Length == olympic_Length[j])
    
    plot(x = olympic_running_loop2$Year,
         y = olympic_residual_loop2$".resid",
         xlab = "Year",
         ylab = "Regression Residual",
         main = paste("Residual v Year"),
         sub = paste0("Race", 
                           olympic_Length[j],
                           "_",
                           olympic_Sex[i]))
    abline(h = 0, col = "red", lty = "dashed")
    
  }
  
}

#QUESTION 2, PART D
Temp_forecast <- olympic_running_tslm %>%
  forecast(new_data(olympic_running_model, 1)) %>%
  hilo()

for(i in 1:length(rownames(Temp_forecast))) {
  
  print(paste("2020 FORECAST -",
              paste0(Temp_forecast[i,1], "m"),
              paste0("(",Temp_forecast[i,2],")"))
  )
  
  print(
    paste("Point Estimate - ", 
         round(Temp_forecast[i,6], 3))
  )
  
  print(
    paste("80% Prediction Interval -",
          Temp_forecast$"80%"[i])
  )
  
  print(
    paste("95% Prediction Interval -",
          Temp_forecast$"95%"[i])
  )
  
  cat("\n")
}

#QUESTION 4, PART A
souvenirs %>%
  mutate(Month = yearmonth(Month)) %>%
  autoplot(Sales) +
  labs(x = "Month",
       y = "Sales",
       title = "Monthly Sales",
       subtitle = "Wharf Shop, Queensland")

souvenirs_x11 <- souvenirs %>%
                  mutate(Month = yearmonth(Month)) %>%
                  model(x11 = X_13ARIMA_SEATS(Sales ~ x11())) %>%
                  components()

autoplot(souvenirs_x11) +
  labs(title = "X11 Decomp - Souvenirs Dataset")

souvenirs %>%
  gg_season(Sales) +
  labs(y = "Sales Volume",
       title = "Seasonal Plot: Sales Volume")

#QUESTION 4, PART C
date_loop <- as.Date(souvenirs$Month)

surfing_loop <- c()

for(i in 1:length(date_loop)) {
  
  if(year(date_loop[i])>=1988 & month(date_loop[i]) == 3) {
      
    surfing_loop[i] <- 1
    
  } else {
  
    surfing_loop[i] <- 0
    
  }
  
}

souvenirs_model <- 
  souvenirs %>%
  add_column(surf_dummy = surfing_loop) %>%
  add_column(log_sales = log(souvenirs$Sales)) %>%
  add_column(season_month = as.factor(month(souvenirs$Month)))

souvenirs_tslm <-
  souvenirs_model %>%
  model(tslm = TSLM(log_sales~trend()+season_month+surf_dummy))

#QUESTION 4, PART D
ggp_restime <- 
  ggplot(data = souvenirs_model,
         aes(x = Month,
             y = residuals(souvenirs_tslm)$".resid")) +
  geom_line() +
  geom_point() +
  labs(x = "Month",
       y = "Regression Residuals",
       title = "Residuals vs. Time")

print(ggp_restime)

ggp_resfitted <-
  ggplot(data = souvenirs_model,
         aes(x = fitted(souvenirs_tslm)$".fitted",
             y = residuals(souvenirs_tslm)$".resid")) +
  geom_point() +
  labs(x = "Regression Fitted",
       y = "Regression Residuals",
       title = "Residuals vs. Fitted")

print(ggp_resfitted) 

#QUESTION 4, PART E
souvenir_tslm_res <-
  residuals(souvenirs_tslm) %>%
    add_column(Month_Val = as.factor(month(residuals(souvenirs_tslm)$Month)))

ggplot(data = souvenir_tslm_res,
       aes(x = Month_Val,
           y = .resid,
           group = Month_Val)) +
  geom_boxplot() +
  labs(x = "Month (#)",
       y = "Residual",
       title = "Boxplots of Residuals",
       subtitle = "by Month")

#QUESTION 4, PART F
report(souvenirs_tslm)

#QUESTION 4, PART G
augment(souvenirs_tslm) %>%
  features(.innov, ljung_box, lag = 20, dof = 14)

ACF_res <-
  augment(souvenirs_tslm) %>%
    ACF(.innov) %>%
    autoplot() +
    labs(x = "lag",
         y = "ACF",
         title = "ACF Plot",
         subtitle = "Sales Model Residuals")

print(ACF_res)

#QUESTION 4, PART H
souvenir_forecast_months <- 
  new_data(souvenirs_model, 36)

surf_dummy <- c()

for(i in 1:length(rownames(souvenir_forecast_months))) {
  
  if(month(souvenir_forecast_months[i,]$Month)==3) {
    
    surf_dummy[i] <- 1
    
  } else {
    
    surf_dummy[i] <- 0
    
  }
  
}

souvenir_forecast_months <-
  souvenir_forecast_months %>%
  add_column(surf_dummy = surf_dummy) %>%
  add_column(season_month = as.factor(month(souvenir_forecast_months$Month)))

souvenir_forecast <- souvenirs_tslm %>%
  forecast(souvenir_forecast_months)

souvenirs_forecast_plot <-
  souvenirs_model %>%
  autoplot(log_sales) +
  autolayer(souvenir_forecast) +
  labs(x = "Month",
       y = "log(Sales) Volume",
       title = "Sales Volume",
       subtitle = "Actual + Forecast (log)")

print(souvenirs_forecast_plot)

souvenir_forecast_hilo <-
  souvenirs_tslm %>%
  forecast(souvenir_forecast_months) %>%
  hilo()

for(i in 1:length(rownames(souvenir_forecast_hilo))){
  
  print(paste("PREDICTION INTERVALS (Adj) - ",
              souvenir_forecast_hilo$Month[i]))
  
  print(paste("Point Est. - ",
              exp(souvenir_forecast_hilo$.mean[i])))
  
  print(paste("80% - ",
              exp(souvenir_forecast_hilo$`80%`[i])))
  
  print(paste("95% - ",
              exp(souvenir_forecast_hilo$`95%`[i])))
  
  cat("\n")
  
}

#QUESTION 5, PART A
us_gasoline_q5 <-
  us_gasoline %>%
  filter(year(us_gasoline$Week) <= 2004)


for(i in 4:10) {

  us_gasoline_prod <-
    us_gasoline_q5 %>%
    model(tslm = TSLM(Barrels~trend() + fourier(K=i)))

  us_gasoline_fitted <-
    data.frame(x = us_gasoline_q5$Week,
               y = fitted(us_gasoline_prod)$".fitted")
  
  colnames(us_gasoline_fitted) <- c("Week", "Barrels")
  
  ggp_obs <-
    ggplot(data = us_gasoline_q5,
           aes(x = Week,
               y = Barrels)) +
    geom_line(col = "grey") +
    theme_bw()
  
  ggp_fit_obs_time <- 
    ggp_obs +
      geom_line(data = us_gasoline_fitted,
                col = "red",
                lty = "dashed") +
    labs(title = "Observed and Fitted Values",
         subtitle = paste("K = ", i))
  
  print(ggp_fit_obs_time)
  
  ggp_fit_obs <-
    ggplot(data = us_gasoline_q5,
           aes(x = Barrels,
               y = fitted(us_gasoline_prod)$".fitted")) +
    geom_point() +
    geom_abline(slope = 1,
                intercept = 0,
                col = "red",
                lty = "dashed") +
    labs(x = "Observed",
         y = "Fitted",
         title = "Fitted vs. Observed",
         subtitle = paste("K = ", i))
  
  print(ggp_fit_obs)
  
}

#QUESTION 5, PART B
AICc_vector <- c()
CV_vector <- c()
Fourier_vector <- c()

#m = 52 therefore K <= 26
for(i in 1:26) {
  
  us_gasoline_prod <-
    us_gasoline_q5 %>%
    model(tslm = TSLM(Barrels~trend() + fourier(K=i)))
  
  AICc_vector[i] <-
    glance(us_gasoline_prod)$AICc
  
  CV_vector[i] <-
    glance(us_gasoline_prod)$CV
  
  Fourier_vector[i] <- i
  
}

Fourier_df <- data.frame(cbind(Fourier_vector,
                               CV_vector,
                               AICc_vector))

colnames(Fourier_df) <- c("Fourier_Value", "CV_Value", "AICc_Value")

Fourier_df$AICc_lab <- paste("K = ", Fourier_df$Fourier_Value, cat("\n"),
                             ", AICc = ", round(Fourier_df$AICc_Value, 2))

Fourier_df$CV_lab <- paste("K = ", Fourier_df$Fourier_Value, cat("\n"),
                             ", CV = ", round(Fourier_df$CV_Value, 2))

ggp_AICc <- 
  ggplot(data = Fourier_df,
         aes(x = Fourier_Value,
             y = AICc_Value)) +
  geom_point() +
  geom_point(data = Fourier_df[which.min(Fourier_df$AICc_Value),], 
             color = "red",
             size = 3) +
  geom_text(data = Fourier_df[which.min(Fourier_df$AICc_Value),],
            aes(Fourier_Value, AICc_Value, label = AICc_lab)) +
  labs(x = "Fourier Value",
       y = "AICc Value",
       title = "AICc Value vs. Fourier Value",
       subtitle = "US Gasoline")

print(ggp_AICc)

ggp_CV <- 
  ggplot(data = Fourier_df,
         aes(x = Fourier_Value,
             y = CV_Value)) +
  geom_point() +
  geom_point(data = Fourier_df[which.min(Fourier_df$CV_Value),], 
             color = "red",
             size = 3) +
  geom_text(data = Fourier_df[which.min(Fourier_df$CV_Value),],
            aes(Fourier_Value, CV_Value, label = CV_lab)) +
  labs(x = "Fourier Value",
       y = "CV Value",
       title = "CV Value vs. Fourier Value",
       subtitle = "US Gasoline")

print(ggp_CV)

#QUESTION 5, PART C
us_gasoline_prod7 <-
  us_gasoline_q5 %>%
  model(tslm = TSLM(Barrels~trend() + fourier(K=7)))

us_gasoline_prod7 %>%
  gg_tsresiduals()

augment(us_gasoline_prod7) %>%
  features(.innov, ljung_box, lag = 20, dof = 9)

#QUESTION 5, PART D
us_gasoline_forecast05 <-
  us_gasoline_prod7 %>%
  forecast(new_data(us_gasoline_q5, 52)) %>%
  hilo()

us_gas05 <- data.frame(
                cbind(us_gasoline_forecast05$Week,
                      us_gasoline_forecast05$.mean,
                      unpack_hilo(us_gasoline_forecast05, "80%")$`80%_lower`,
                      unpack_hilo(us_gasoline_forecast05, "80%")$`80%_upper`,
                      unpack_hilo(us_gasoline_forecast05, "95%")$`95%_lower`,
                      unpack_hilo(us_gasoline_forecast05, "95%")$`95%_upper`)
                )

colnames(us_gas05) <- c("Week",
                        "Mean",
                        "Upper80",
                        "Lower80",
                        "Upper95",
                        "Lower95")

us_gas05_80L <- 
  us_gas05 %>%
  select(Week, Barrels = Lower80)

us_gas05_80U <- 
  us_gas05 %>%
  select(Week, Barrels = Upper80)

us_gas05_95L <- 
  us_gas05 %>%
  select(Week, Barrels = Lower95)

us_gas05_95U <- 
  us_gas05 %>%
  select(Week, Barrels = Upper95)

us_gas05_point <- 
  us_gas05 %>%
  select(Week, Barrels = Mean)

us_gasoline %>%
  filter(year(Week)==2005) %>%
  ggplot(aes(x = Week,
             y = Barrels)) +
  geom_line() +
  labs(x = "Week",
       y = "Barrels",
       title = "Barrels w/ Time",
       subtitle = "US, 2005 (to forecast)") +
  geom_line(data = us_gas05_95L, col = "light blue", lty = "dashed") +
  geom_line(data = us_gas05_80L, col = "dark blue", lty = "dashed") +
  geom_line(data = us_gas05_point, col = "dark grey", lty = "dashed") +
  geom_line(data = us_gas05_80U, col = "dark blue", lty = "dashed") +
  geom_line(data = us_gas05_95U, col = "light blue", lty = "dashed") +
  theme_bw()

us_gasoline_05A <- 
  us_gasoline %>%
  filter(year(Week) == 2005)

Actual_2005 <- us_gasoline_05A$Barrels
Point_2005 <- us_gas05$Mean
Point_2005_accuracy <- Point_2005/Actual_2005

print(mean(Point_2005_accuracy))

#QUESTION 6 PART A
afghan_economy <-
  global_economy %>%
    filter(Code == "AFG") %>%
    mutate(Population = Population/1000000) 

afghan_economy %>%
  autoplot(Population) +
  labs(x = "Year",
       y = "Population (millions)",
       title = "Population over Time",
       subtitle = "Afghanistan")

#QUESTION 6, PART B
afghan_tslm <-
  afghan_economy %>%
  model(linear = TSLM(Population~trend()),
        piecewise = TSLM(Population~trend(knots = c(1980, 1989))))

afghan_tslm %>%
  select(linear) %>%
  report()

afghan_tslm %>%
  select(piecewise) %>%
  report()

#QUESTION 6, PART C
afghan_trend <- 
  afghan_tslm %>%
  forecast(h = 5)

afghan_economy %>%
  autoplot(Population) +
  geom_line(data = fitted(afghan_tslm),
            aes(y = .fitted, colour = .model)) +
  autolayer(afghan_trend, alpha = 0.5, level = 95) +
  labs(y = "Population",
       title = "Population",
       subtitle = "Afghanistan")